XML Information Retrieval Considering Physical Page Layout of Logical Elements
نویسندگان
چکیده
XML information retrieval (XML-IR) systems utilize the logical structure of XML documents for retrieving relevant elements. From a practical point of view, displaying the search results of XML-IR systems is important to achieve. When we search XML documents that are constructed by marking up documents originally composed of pages, such as scholarly articles or books, we would like result elements to be overlaid on the physical layout of pages in the user interfaces. We propose such a displaying method for keyword searches on XML documents of scholarly articles and ranking methods based on page units. We also need a new ranking method different from those used in simple element ranking because multiple result elements may be in the same page. We propose a ranking method considering the benefit that we obtain from the result elements and the reading effort that needs to be spent in reading the result elements and nearby elements to understand the content of the result elements.
منابع مشابه
XML Retrieval
DEFINITION Text documents often contain a mixture of structured and unstructured content. One way to format this mixed content is according to the adopted W3C standard for information repositories and exchanges, the eXtensible Mark-up Language (XML). In contrast to HTML, which is mainly layout-oriented, XML follows the fundamental concept of separating the logical structure of a document from i...
متن کاملQuery Relaxation by Structure and Semantics for Retrieval of Logical Web Documents
Since WWW encourages hypertext and hypermedia document authoring (e.g. HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks. A Web document may be authored in multiple ways, such as (1) all information in one physical page, or (2) a main page and the related information in separate linked pages. Existing Web search engines, however, re...
متن کاملRelationships in Structured Text Retrieval
SYNONYM None DEFINITION In structured text retrieval, the relationship between text components may be used in ranking components relative to a given query. MAIN TEXT In a structured text document, there exists a relationship between the document components. In the context of XML retrieval, the relationships between elements are provided by the logical structure of the XML markup. An element, un...
متن کاملImproved CHAID algorithm for document structure modelling
This paper proposes a technique for the logical labelling of document images. It makes use of a decision-tree based approach to learn and then recognise the logical elements of a page. A state-of-the-art OCR gives the physical features needed by the system. Each block of text is extracted during the layout analysis and raw physical features are collected and stored in the ALTO format. The data-...
متن کاملXML Information Retrieval
Nowadays, increasingly, documents are marked-up using XML, the format standard for structured documents. In contrast to HTML, which is mainly layoutoriented, XML follows the fundamental concept of separating the logical structure of a document from its layout. This document logical structure can be exploited to allow a focused access to documents, where the aim is to return the most relevant fr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007